Bio-mathematics, Statistics, and Nano-Technologies Mosquito Control Strategies (Peyman Ghaffari)

180

■Bio-mathematics, Statistics and Nano-Technologies: Mosquito Control Strategies

Table 9.2: The architecture of ANN models aimed for prediction of Rindex based on physic-

ochemical properties and some statistical parameters of the networks.

Network

Rtrain

Rtest

Rvalid

RMSEtrain

RMSEtest

RMSEvalid

Training

Hidden

Output

Architecture

Algorithm

Activation

Function

MLP 3-6-1

0.9183

0.9116

0.9998

100.7

12.8

38.6

BFGS 24^∗

Tanh

Exponential

MLP 3-8-1

0.9396

0.8294

0.9989

75.6

10.4

67.3

BFGS 24^∗

Logistic

MLP 3-6-1

0.9103

0.6379

0.9990

117.5

31.3

104.7

BFGS 10^∗

Exponential

Logistic

*the number of training cycles after which the best network architecture is reached

et al. 2008. The obtained high-quality model is aimed for prediction of repellent activity of

novel compounds structurally similar to the compounds used in the ANN modeling.

9.3.4

Mathematical validation of QSAR models

A formed mathematical model is not applicable and cannot be considered reliable if

it is not statistically validated by using a proper validation approach (Gramatica and San-

gion, 2016; Chirico and Gramatica, 2012; Chirico and Gramatica, 2011). Some of the

standard validation parameters are Pearson correlation coefﬁcient (R), determination co-

efﬁcient (R²), adjusted determination coefﬁcient (R²

adj^{), Fisher test (}^F^{-value), root mean}

square error (RMSE) and probability (p-value). As a part of internal validation of the mod-

els, cross-validation (also known as out-of-sample testing) is often applied. This heuristic

validation method is based on the omitting one or more objects from the set and the mod-

eling is than based on the remaining compounds in the training set and the activity of

the removed compounds is then estimated based on the newly established QSAR model

(Gramatica and Sangion 2016; Chirico and Gramatica 2012; Chirico and Gramatica 2011).

These cycles are repeated for all the compounds from the training set. Eventually, the pre-

dictivity of the QSAR model is judged based on the following parameters: cross-validation

determination coefﬁcient (R²cv), total sum of squares (TSS), predicted residual error sum

of squares (PRESS), PRESS/TSS ratio and predicted standard deviation (SDPRESS).

One of the most reliable validation approaches of the QSAR models is the external

validation when the external test set is kept out of the training set and then after the mod-

eling is used for the testing of real predictivity of the QSAR model. The parameters of

the external validation are determination coefﬁcient of the prediction of the external set

(R²ext), mathematically different determination coefﬁcient for the external validation in-

cluding Q²F1, Q²F2, Q²F3 and r²m, as well as concordance correlation coefﬁcient (CCC).

Detailed explanation of the validation of QSAR models can be found elsewhere (Gramat-

ica and Sangion 2016; Chirico and Gramatica 2012; Chirico and Gramatica 2011).

The statistical parameters of the established linear models from the subsection 3.1.1.

are presented in Table 2. The results indicate that the model MLR2 has the highest pre-

dictivity according to the highest R²cv coefﬁcient and the lowest error (the lowest RMSE

parameter), while the ULR model makes the biggest error in prediction and may be used

for approximate estimation of Rindex of the compounds structurally similar to the com-

pounds used in model’s calibration. According to the values of R²adj it can be concluded